Learning Rewards From Linguistic Feedback

نویسندگان

چکیده

We explore unconstrained natural language feedback as a learning signal for artificial agents. Humans use rich and varied to teach, yet most prior work on interactive from assumes particular form of input (e.g., commands). propose general framework which does not make this assumption, instead using aspect-based sentiment analysis decompose into over the features Markov decision process. then infer teacher's reward function by regressing features, an analogue inverse reinforcement learning. To evaluate our approach, we first collect corpus teaching behavior in cooperative task where both teacher learner are human. implement three learners: sentiment-based "literal" "pragmatic" models, inference network trained end-to-end predict rewards. re-run initial experiment, pairing human teachers with these learners. All models successfully learn feedback. The approaches performance model, while model nears performance. Our provides insight information structure naturalistic linguistic well methods leverage it

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Beyond Rewards : Learning from Richer Supervision

Recently there has been some interest in the reinforcement learning community on learning from richer feedback from the environment rather than just a scalar reward signal. In this paper we look at the question of learning from sporadic instructions from a human. Instructions can take several forms, from complete specification of policies, to directing the agent’s search to specific parts of th...

متن کامل

Reinforcement Learning Without Rewards

Machine learning can be broadly defined as the study and design of algorithms that improve with experience. Reinforcement learning is a variety of machine learning that makes minimal assumptions about the information available for learning, and, in a sense, defines the problem of learning in the broadest possible terms. Reinforcement learning algorithms are usually applied to “interactive” prob...

متن کامل

Discretionary rewards as a feedback mechanism

Article history: Received 7 March 2006 Available online 19 March 2009 JEL classification: D82 J33 M50

متن کامل

Learning Analytics: Readiness and Rewards

This position paper introduces the relatively new field of learning analytics, first by considering the relevant meanings of both “learning” and “analytics,” and then by looking at two main levels at which learning analytics can be or has been implemented in educational organizations. Although integrated turnkey systems or modules are not yet available for review, specific technologies for anal...

متن کامل

Value and probability coding in a feedback-based learning task utilizing food rewards.

For the consequences of our actions to guide behavior, the brain must represent different types of outcome-related information. For example, an outcome can be construed as negative because an expected reward was not delivered or because an outcome of low value was delivered. Thus behavioral consequences can differ in terms of the information they provide about outcome probability and value. We ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i7.16749